* =============================================================================
* File Name: 2016_EC_EnvRegulation--CWSAnalysis_ASMMerge_MainFile_FullSample_Aug23.do

* File Description: This file performs the main analysis on the CWS. Each regression
* is run separately for PM 2.5, PM 10, NOx, and VOC polluters.  

* This file uses the analysis dataset created with the CMA definitions from the 
* ASM.  

* Notes: The data used in this analysis (npri_asm_envreg_ASMAirQuality_analysis_OnlyContempEmitters.dta) 
* was created using the CWS data cleaning do file 
* (2016_EC_EnvRegulation--CWSDataCleaning_OnlyContempEmitters_useASMCMA.do). That 
* file does the majority of the cleaning and data creation. Additional trimming and
* variable creation are done herein.  

* Creation date: June 7, 2016

* This version: March 6, 2019

* Author: Nouri Najjar
* =============================================================================

* Specify Pollutant
local x = "PM25"

* -----------------------------------------------------------------------------
* Section 1: Call the npri-asm data with air quality from the cder server. 

* This calls the npri-asm data (with CMA-year level air quality) that has been
* prepared for the analysis. To save memory space, all non-polluters are dropped.
* -----------------------------------------------------------------------------
* Set server folder. 
local datadir \\f4cder01\2016_EC_EnvRegulation\DATA\

* Open dataset. Use the ASM CMA Matched Data. 
use "`datadir'asm_npri_2002to2012`x'_withAQ_analysis.dta", clear
* -----------------------------------------------------------------------------

* -----------------------------------------------------------------------------
* Section 2: Set stata parameters.  
* -----------------------------------------------------------------------------
* Set matsize to 2000.
set matsize 2000
* -----------------------------------------------------------------------------


* -----------------------------------------------------------------------------
* Section 3: Trim sample and create new variables  
* -----------------------------------------------------------------------------

*******
* Restrict Sample to Pollutant of Interest
*******
gen PM25=(cas_number=="NA - M10")
gen NOx=(cas_number=="11104-93-1")

keep if `x'==1
*******	 

*******
* Restrict Sample to 2004-2010
*******
keep if yr4>=2004 & yr4<=2010
*******

*******
* Create new industry variable that takes NAICS3 for all industries except
* 4-digit treated industries.
gen Industry = naics3
replace Industry = naics4 if naics4>= 3270 & naics4<=3279
replace Industry = naics4 if naics4>= 3240 & naics4<=3249

* Generate new industry year indicator
egen Ind3Year = group(Industry yr4)
*******

*******
* Drop all plants that either change cma over the sample or change industry
*******
bysort NPRI_ID: egen Mean_CMA=mean(CMA_ID_Merge)
gen ChangeCMA = cond(Mean_CMA!=CMA_ID_Merge,1,0)

bysort NPRI_ID: egen Mean_Industry=mean(Industry)
gen ChangeIndustry = cond(Mean_Industry!=Industry,1,0)

* Drop if change cma
drop if ChangeCMA==1
drop ChangeCMA
* drop if change industry
drop if ChangeIndustry==1
drop ChangeIndustry
*******


*******
* Create common sample across all regressions
*******
* Drop non-zero pollution
drop if missing(LN_total_air_converted)==1

* Drop all plants with missing sales or employment
drop if missing(LN_real_vsm)==1
drop if missing(LN_totalemp)==1

* Drop plants with missing VA or VA<0
drop if missing(LN_real_vam)==1 
drop if real_vam<0

* Drop all plants with missing CMA information
drop if missing(CMA_ID_Merge)==1

* Drop any plants with multiple ASM observations in a given year. 
bysort NPRI_ID cas_number yr4: gen dupLocNo = cond(_N==1,0,_n)
drop if dupLocNo>1
*******

*******
* Drop CMAs with 1 industry and industries with 1 cma
*******
* Tag distinct values
egen tag = tag(CMA_ID_Merge Industry)

* Drop if only one industry in CMA
egen distinct_CMA = total(tag), by(CMA_ID_Merge)
drop if distinct_CMA==1

* Drop if only one CMA in industry
egen distinct_Industry = total(tag), by(Industry)
drop if distinct_Industry==1
drop tag 
*******

*******
* Drop plants with only one year for each pollutant
*******
bysort NPRI_ID cas_number: egen NumberYears_PolPlant = count(NPRI_ID)
drop if NumberYears_PolPlant==1
*******

*******	
* CMA-Industry indicator for clustering standard errors
*******	
egen CMA_Ind3 = group(CMA_ID_Merge Industry)
*******	

*******	 
* Recode missing treatment plants as control
*******	 
recode PM25Treated (.=0)
recode O3Treated (.=0)
*******	 
 

*******	
* Create and scale values for table
*******	
* Sales scaled by 1000000
gen Sales_sc = real_vsm*0.000001
* Value added scaled by 1000000
gen VA_sc = real_vam*0.000001
* VA per worker scaled
gen VAPerWorker_sc = VAPerWorker*0.001
* Energy
gen Energy_sc = heatpr*0.001
* Domestic sales scaled
gen DomesticRevenue_sc = DomesticRevenue*0.000001
* Own province sales scaled by 1000000
gen OwnProvRevenue_sc = OwnProvRevenue*0.000001
* Other province sales scaled by 1000000
gen OtherProvRevenue_sc = OtherProvRevenue*0.000001
* Foreign sales scaled
gen ExportRevenue_sc = ExportRevenue*0.000001
*******	

*******
* Generate Firm Size Measure - first year in data, removing industry and cohort FEs
*******
* Create first year variable for pollutant-plant
bysort NPRI_ID cas_number: egen FirstYear = min(yr4)

***
* Size Measure
***
* VA per worker as size
bys NPRI_ID: gen firstyear_size = VAPerWorker if yr4==FirstYear
***

* Remove FEs
quietly areg firstyear_size i.FirstYear if yr4==FirstYear, absorb(naics3)
predict p_sizestore, r
bysort NPRI_ID (p_sizestore): gen p_size = p_sizestore[1]

* Create quartiles
xtile qp_size = p_size, n(3)
quietly tabulate qp_size, g(qp_size_q)

*Gen Treatment Variables by Size
forvalues i = 1(1)3 {
gen PM25Treated_q`i'=PM25Treated*qp_size_q`i'
}
forvalues i = 1(1)3 {
gen O3Treated_q`i'=O3Treated*qp_size_q`i'
}
*******

*******	
* Only keep regions with air quality data
*******	
keep if missing(PM25_CMA)==0
*******	

***
* Track unique firm observations. Used in summary stats. 
***
bysort NPRI_ID: gen dupPlant = cond(_N==1,0,_n)
***

***
* Indicator for exiting plants. Used in summary stats. 
***
bysort NPRI_ID: egen LastYearAlive = max(yr4)
bysort NPRI_ID: egen EverExit = total(cond(LastYearAlive==yr4 & yr4!=2010,1,0))
***
* -----------------------------------------------------------------------------



/* COMMENTED OUT TO SURPRESS FREQUNCIES FOR VETTING. 
* -----------------------------------------------------------------------------
* Section 5: crosstabs to show sample size in each binary variable. 
* -----------------------------------------------------------------------------
***
* Treatment status (key independent variables)
***
* Number of observations by treatment status, PM 2.5 standard
tab PM25Treated
* Number of observations by treatment status, O3 standard
tab O3Treated
***

***
* Treatment status by R&D indicator
***
* R&D by PM standard
tab RD_Indicator PM25Treated
* R&D by O3 standard
tab RD_Indicator O3Treated
***
	   
***
* Treatment status for R&D subsample (a small number of firms report positive R&D spending)
***
* Number of observations by treatment status, PM 2.5 standard
tab PM25Treated if missing(LN_RD)==0

* Number of observations by treatment status, O3 standard
tab O3Treated if missing(LN_RD)==0   
***

***
* Treatment status by productivity bin
***
* Number of observations by treatment status, PM 2.5 standard
tab PM25Treated_q1
tab PM25Treated_q2
tab PM25Treated_q3

* Number of observations by treatment status, O3 standard
tab O3Treated_q1
tab O3Treated_q2
tab O3Treated_q3
***

***
* Number of observations in each bin of the bin scatterplot
tab ScatterBins
***
* -----------------------------------------------------------------------------
*/


* -----------------------------------------------------------------------------
* Section 6: estimate the effect of the CWS on pollution, output (sales and value added),
* and pollution intensity (pollution/sales and pollution/value added).

* Performs main linear DDD regressions. 
* The regression is a triple difference estimation, with plant, industry-by-year,
* and cma-by-year fixed effects. Standard errors are clustered by CMA-Industry.
* -----------------------------------------------------------------------------
* Pollution
reghdfe LN_total_air_converted PM25Treated O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'Emissions_Both
reghdfe LN_real_vsm PM25Treated O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'Sales_Both
reghdfe LN_real_vam PM25Treated O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'VA_Both
reghdfe LN_PollutionPerSales PM25Treated O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'EmissionSales_Both
reghdfe LN_PollutionPerVA PM25Treated O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'EmissionVA_Both


* Print all results in a tex file. 
esttab `x'Emissions_Both `x'Sales_Both `x'VA_Both `x'EmissionSales_Both `x'EmissionVA_Both using 2016_EC_EnvRegulation--`x'MainResults_March2019.xlsx, csv ///
	   keep(PM25Treated O3Treated) b(%9.3f) se(%9.3f)  r2 obslast star(* 0.10 ** 0.05 *** 0.01) coeflabel(PM25Treated "PM 2.5 Standard" O3Treated "O3 Standard") ///
	   mtitles("Emissions" "Sales" "Value Added" "Pollution/Sales" "Pollution/VA") nonotes addnotes("Notes: Standard errors clustered by CMA-industry.") fragment replace 
	   
* Save data
keep if e(sample)
local datadir \\f4cder01\2016_EC_EnvRegulation\DATA\
saveold "`datadir'2016_EC_EnvRegulation--`x'MainResultsALL_March2019.dta", replace
* -----------------------------------------------------------------------------


* -----------------------------------------------------------------------------
* Section 7: estimate the effect of the CWS on mechanisms. 
* -----------------------------------------------------------------------------
* Employment
reghdfe LN_totalemp PM25Treated O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'_Employment
* VA per worker
reghdfe LN_VAPerWorker PM25Treated O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'_VAPerWorker
* Production materials
preserve
reghdfe LN_prodmat PM25Treated O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'_ProdMat
keep if e(sample)
local datadir \\f4cder01\2016_EC_EnvRegulation\DATA\
saveold "`datadir'2016_EC_EnvRegulation--`x'Mechanisms_Col3_March2019.dta", replace
restore
* Energy
preserve
reghdfe LN_heatpr PM25Treated O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'_Energy
keep if e(sample)
local datadir \\f4cder01\2016_EC_EnvRegulation\DATA\
saveold "`datadir'2016_EC_EnvRegulation--`x'Mechanisms_Col4_March2019.dta", replace
restore
* R&D
preserve
reghdfe LN_RD PM25Treated O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'_RD
keep if e(sample)
local datadir \\f4cder01\2016_EC_EnvRegulation\DATA\
saveold "`datadir'2016_EC_EnvRegulation--`x'Mechanisms_Col5_March2019.dta", replace
restore
* R&D propensity
reghdfe RD_Indicator PM25Treated O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'_ProbRD


* Print all mechanism results in a tex file. 
esttab `x'_Employment `x'_VAPerWorker `x'_ProdMat `x'_Energy `x'_RD `x'_ProbRD using 2016_EC_EnvRegulation--`x'Mechanisms_March2019.xlsx, csv ///
	   title("Mechanisms") keep(PM25Treated O3Treated) b(%9.3f) se(%9.3f)  r2 obslast star(* 0.10 ** 0.05 *** 0.01) coeflabel(PM25Treated "PM 2.5 Standard" O3Treated "O3 Standard") ///
	   mtitles("Employment" "VA/Worker" "Production Materials" "Energy" "RD Spending" "Pr(RD)") nonotes addnotes("Notes: Standard errors clustered by CMA-industry.") fragment replace 	   
* -----------------------------------------------------------------------------



* -----------------------------------------------------------------------------
* Section 8: Estimate the effect of the CWS on pollution, output (sales and value added),
* and pollution intensity (pollution/sales and pollution/value added) by productivity
* level.
* -----------------------------------------------------------------------------
* Pollution
reghdfe LN_total_air_converted PM25Treated_q1-PM25Treated_q3 O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'Emissions_Het
* Sales
reghdfe LN_real_vsm PM25Treated_q1-PM25Treated_q3 O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'Sales_Het
* VA
reghdfe LN_real_vam PM25Treated_q1-PM25Treated_q3 O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'VA_Het
* Pollution/Sales
reghdfe LN_PollutionPerSales PM25Treated_q1-PM25Treated_q3 O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'PolPerSales_Het
* Pollution/VA
reghdfe LN_PollutionPerVA PM25Treated_q1-PM25Treated_q3 O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'PolPerVA_Het

* Print all heterogeneity results in a tex file. 
esttab `x'Emissions_Het `x'Sales_Het `x'VA_Het `x'PolPerSales_Het `x'PolPerVA_Het using 2016_EC_EnvRegulation--`x'MainResults_byFirmSize_March2019.xlsx, csv ///
	   title("Main Results by Plant Productivity)") keep(PM25Treated_q1 PM25Treated_q2 PM25Treated_q3 O3Treated) b(%9.3f) se(%9.3f) ///
	   r2 obslast star(* 0.10 ** 0.05 *** 0.01) coeflabel(PM25Treated_q1 "PM 2.5 Standard x Q1" PM25Treated_q2 "PM 2.5 Standard x Q2" PM25Treated_q3 "PM 2.5  Standard x Q3" ///
	   O3Treated "O3 Standard") ///
	   mtitles("Emissions" "Sales" "VA" "Pollution/Sales" "Pollution/VA") nonotes addnotes("Notes: Standard errors clustered by CMA-industry.") fragment replace 	   
* -----------------------------------------------------------------------------


* -----------------------------------------------------------------------------
* Section 9: estimate the effect of the CWS by productivity quantile for key 
* mechanisms. 
* -----------------------------------------------------------------------------
* Employment
reghdfe LN_totalemp PM25Treated_q1-PM25Treated_q3 O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'Employment_Het
* VA per worker
reghdfe LN_VAPerWorker PM25Treated_q1-PM25Treated_q3 O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'_VAPerWorker_Het
* Production materials
reghdfe LN_prodmat PM25Treated_q1-PM25Treated_q3 O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'_ProdMat_Het
* Energy
reghdfe LN_heatpr PM25Treated_q1-PM25Treated_q3 O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'_Energy_Het
* R&D propensity
reghdfe RD_Indicator PM25Treated_q1-PM25Treated_q3 O3Treated [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year) vce(cluster CMA_Ind3)
estimates store `x'_ProbRD_Het

* Print all heterogeneity results in a tex file. 
esttab `x'Employment_Het `x'_VAPerWorker_Het `x'_ProdMat_Het `x'_Energy_Het `x'_ProbRD_Het using 2016_EC_EnvRegulation--`x'Mechanisms_byFirmSize_March2019.xlsx, csv ///
	   title("Main Results by Plant Productivity)") keep(PM25Treated_q1 PM25Treated_q2 PM25Treated_q3 O3Treated) b(%9.3f) se(%9.3f) ///
	   r2 obslast star(* 0.10 ** 0.05 *** 0.01) coeflabel(PM25Treated_q1 "PM 2.5 Standard x Q1" PM25Treated_q2 "PM 2.5 Standard x Q2" PM25Treated_q3 "PM 2.5  Standard x Q3" ///
	   O3Treated "O3 Standard") ///
	   mtitles("Employment" "VA per Worker" "Materials" "Energy" "pr(RD)") nonotes addnotes("Notes: Standard errors clustered by CMA-industry.") fragment replace 
* ----------------------------------------------------------------------------


* -----------------------------------------------------------------------------
* Section 4: Produce summary statistics
* -----------------------------------------------------------------------------
***
* Section 4a) Means and standard deviations for plant pollution, sales, value added, labour productivity
* and energy.
***
eststo SumStats_`x': estpost summarize total_air_converted Sales_sc VA_sc totalemp VAPerWorker_sc Energy_sc [aw=Weights_`x']

* Print in table
esttab SumStats_`x' using 2016_EC_EnvRegulation--`x'SummaryStats_March2019.xlsx, csv ///
	   main(mean %9.2fc) aux(sd %9.2fc) obslast title("Summary Statistics") coeflabel(total_air_converted "Emissions (tonnes)" Sales_sc "Sales" VA_sc "Value Added" totalemp "Employment" VAPerWorker_sc "VA/Worker)" Energy_sc "Energy") ///
	   mtitles("PM 2.5") nonotes replace 	   
***

***
* Section 4b) Summary stats for counterfactuals
***
* Mean sales in base year by treated and untreated
* PM Standard
eststo PMTreatStats_`x': quietly estpost tabstat Sales_sc [aw=Weights_`x'] if yr4==2004, by(PM25Treated) statistics(mean)
* Ozone Standard
eststo O3TreatStats_`x': quietly estpost tabstat Sales_sc [aw=Weights_`x'] if yr4==2004, by(O3Treated) statistics(mean)

*Mean sales conditional on closing for counterfactuals
* All closers
eststo AllCloseStats_`x': quietly estpost tabstat Sales_sc [aw=Weights_`x'], by(EverExit)  statistics(mean)
* Regulated closers
eststo RegCloseStats_`x': quietly estpost tabstat Sales_sc [aw=Weights_`x'] if PM25Treated==1, by(EverExit)  statistics(mean)

* Print means in table
esttab PMTreatStats_`x' O3TreatStats_`x' AllCloseStats_`x' RegCloseStats_`x' using 2016_EC_EnvRegulation--`x'CounterfactualStats_March2019.xlsx, csv ///
	   main(mean %9.2fc) aux(sd %9.2fc) noobs title("Summary Statistics for Counterfactuals") ///
	   mtitles("PM 2.5 Treatment Status" "O3 Treatment Status" "Ever Closed" "Ever Closed - Regulated") nonotes addnotes("Notes: Each column is a different sample from the NPRI-ASM.") replace 	   

* Print regulated in 2004 counts in table
tab PM25Treated O3Treated if yr4==2004

* Print ever exit by regulated counts
tab EverExit PM25Treated if dupPlant<=1  

* Save data for 2004 summary stats
preserve
keep if yr4==2004
local datadir \\f4cder01\2016_EC_EnvRegulation\DATA\
saveold "`datadir'2016_EC_EnvRegulation--`x'2004SummaryStats_March2019.dta", replace
restore
***

***
* Section 4c) Correlation between plant labour productivity and pollution intensity
***
* Cross-sectional regression
reg LN_PollutionPerSales LN_VAPerWorker [pw=Weights_`x']
estimates store PMbyProductivity
* Regression with plant FEs
reghdfe LN_PollutionPerSales LN_VAPerWorker [pw=Weights_`x'], absorb(NPRI_ID)
estimates store PMbyProductivity_PlantFE
* Regression with all FEs used in main regressions (plant, CMA-Year, Industry-Year)
reghdfe LN_PollutionPerSales LN_VAPerWorker [pw=Weights_`x'], absorb(NPRI_ID Ind3Year CMA_Year)
estimates store PMbyProductivity_AllFE

* Print results in table
esttab PMbyProductivity PMbyProductivity_PlantFE PMbyProductivity_AllFE using 2016_EC_EnvRegulation--`x'PMbyProductivityRegressions_March2019.xlsx, csv ///
	   keep(LN_VAPerWorker) b(%9.3f) se(%9.3f)  r2 obslast star(* 0.10 ** 0.05 *** 0.01) coeflabel(LN_VAPerWorker "ln(VA per Worker)") ///
	   mtitles("Cross-Section" "Plant FE" "All FEs") nonotes addnotes("Notes: Standard errors clustered by CMA-industry.") fragment replace 	  
***

***
* Section 4d) Bin scatter plot of pollution intensity by productivity
***
* Change directory to save graphs. 
local datadir \\f4cder01\2016_EC_EnvRegulation\

* Run bin scatter
binscatter LN_PollutionPerSales LN_VAPerWorker [aw=Weights_`x'], n(14) genxq(ScatterBins) graphregion(color(white)) ytitle("ln(Pollution/Sales)") xtitle("ln(Value Added per Worker)") 
graph save "`datadir'Batchsubmit\2016_EC_EnvRegulation--`x'PollutionIntensityByProductivity_BinScatterplot.gph", replace
***
* -----------------------------------------------------------------------------
* -----------------------------------------------------------------------------


